Introduction

I wanted to focus on using bivariate mapping to display how one variable varys with another, across space. After making bivariate maps that were static, I wanted to see if I could add more information about the data, or an additional dataset using some interactive features on the map. The purpose of the interactive map would be to better visualize complex datasets, as well as increase accessibilty for the public and policymakers. The next step would be to employ the same techniques in a shiny application for publication in an appropriate outlet. Here, I’m using publically available data that was used in the demonstration I followed, household income, personal income, and the Gini Coefficient. This data was derived from the 2017 American Community Survey 5-Year Estimates.

Psuedocode

  1. Load and Organize data. Need to have a working census API key to access data. I’m using two datasets that I want to see vary together household income and personal income, as well as the Gini Coefficient which I want to see as a popup on the map. I want a spatial dataframe in sf form to play nice with tidycensus, with each row (44) to be an Idaho county and each column (3) to be a variable, as well as the GEOID as a unique identifier, the pretty name of the County, and geometries for each county.

  2. CRITICAL Create bins or ‘breaks’ that will indicate which level of income or Gini Coefficient the county falls into, compared to all the counties in the state of Idaho. It’s important that these are calculated to show what I want them to, because everything else depends on these bins. Jenks, Quartile, Fisher are all options to use.

  3. Map the incomes based on the breaks I calculated.

  4. Add interactive features (i.e., display actual values of the variables)

  5. Display the map, check the values from the table to the mouseover

Required libraries

#load the libraries - tell us which packages you're using and why
library(tidycensus) #load the ACS data; note - you'll need your own census keycode to access the data
library(dplyr) #data wrangling 
library(magrittr) #data wrangling
library(sf) #we are using tidycensus which will return a sf-class dataframe
library(ggplot2) #plotting the data
library(leaflet) #interactive map
library(leaflet.extras) #extra goodies for prettying up the interactive map 
library(htmlwidgets)
library(htmltools)

Evaluate your choices

Use profiling and benchmarking to evaluate which of your options is likely to be the fastest. How does the syntax and/or ease of use of that function impact your decision of whether or not to use it? (For example, velox is much faster than raster, but it’s less well documented and the syntax is strange to get used to).

# Named vector of ACS tables to get from Census API
varlist <- c("hhinc" = "B19013_001",
             "pearn" = "B20002_001",
             "gini" = "B19083_001")

# tract-level summary data from Census API
idaho <- get_acs(
  geography = "county",
  state = "ID",
  variables = varlist,
  survey = "acs5",
  year = 2017,
  output = "wide",
  geometry = TRUE, 
  key = "9af94992069cef5ed20227c7167a5916216d9705"
)

# drop margins of error
idaho <- idaho[, c(1, 2, seq(3, 9, by = 2))]

# re-project to play nice with leaflet
idaho <- st_transform(idaho, 4326, type = "datum")

# get median hh income for SF-Oakland-Hayward MSA as baseline for breaks
# returns 92714 - retain in case API is down
hhinc_idaho <- get_acs(
  geography = "state",
  variables = c("hhinc" = "B19013_001"),
  survey = "acs5",
  year = 2017,
  key = "9af94992069cef5ed20227c7167a5916216d9705"
) %>% 
  select(estimate) %>% 
  pull()

# get values for lower & upper "Pew breaks"
hhinc_lower <- ((2/3) * hhinc_idaho)
hhinc_upper <- (2 * hhinc_idaho)

# binned palette using Pew breaks & full-color univariate scale hex values
pal_hhinc <- colorBin(
  palette = c("#EDEDED", "#FF94C0", "#FF2C54"),
  domain = idaho$hhinc,
  bins = c(0, hhinc_lower, hhinc_upper, max(idaho$hhinc, na.rm = T))
)


# get median personal earnings for state
# returns 46477
pearn_state <- get_acs(
  geography = "state",
  variables = c("pearn" = "B20002_001"),
  survey = "acs5",
  year = 2017,
  key = "9af94992069cef5ed20227c7167a5916216d9705"
) %>% 
  select(estimate) %>% 
  pull()

# personal earnings Pew breaks
pearn_lower <- ((2/3) * pearn_state)
pearn_upper <- (2 * pearn_state)

# palette using same colors as hh income
pal_pearn <- colorBin(
  palette = c("#EDEDED", "#FF94C0", "#FF2C54"),
  domain = idaho$pearn,
  bins = c(0, pearn_lower, pearn_upper,  max(idaho$pearn, na.rm = T))
)


# gini scale, using breaks at terciles (1/3-quantiles)
gini_state <- get_acs(
  geography = "state",
  variables = c("gini" = "B20002_001"),
  survey = "acs5",
  year = 2017,
  key = "9af94992069cef5ed20227c7167a5916216d9705"
) %>% 
  select(estimate) %>% 
  pull()


pal_gini <- colorQuantile(
  palette = c("#EDEDED", "#94C6E7", "#4CB1DF"),
  domain = idaho$gini,
  probs = seq(0, 1, length.out = 4)
)

Bivariate map

Two layers: household income and personal earnings

m <- idaho %>% 
  leaflet(
    width = "100%",
    options = leafletOptions(zoomSnap = 0.25, zoomDelta = 0.5)
  ) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  # HH Income Layer
  addPolygons(
    group = "Household Income",
    fillColor = ~pal_hhinc(idaho$hhincE),
    fillOpacity = 0.5,
    stroke = FALSE,
    smoothFactor = 0
  ) %>%
# Personal Earnings Layer
  addPolygons(
    group = "Personal Earnings",
    fillColor = ~pal_pearn(idaho$pearnE),
    fillOpacity = 0.5,
    stroke = FALSE,
    smoothFactor = 0
  ) %>% 
  # Gini Layer
  addPolygons(
    group = "Gini Coefficient",
    fillColor = ~pal_gini(idaho$giniE),
    fillOpacity = 0.5,
    stroke = FALSE,
    smoothFactor = 0
  ) %>% 
  # income measure switching
  addLayersControl(
    baseGroups = c("Household Income", "Personal Earnings"),
    options = layersControlOptions(collapsed = FALSE),
    position = "topright"
  ) %>% 
  # forces income measure layer to bottom on switch
  htmlwidgets::onRender("
    function(el, x) {
      this.on('baselayerchange', function(e) {
        e.layer.bringToBack();
      })
    }
  ")

m
# data frame with no geographic data
nogeog <- st_drop_geometry(idaho)

# list of HTML for each observation (census tract)
labs <- lapply(seq(nrow(nogeog)), function(i) {
  paste0(
    "Median Household Income: $", prettyNum(nogeog[i, "hhincE"], big.mark = ","), "<br>",
    "Median Presonal Earnings: $", prettyNum(nogeog[i, "pearnE"], big.mark = ","), "<br>",
    "Gini Coefficient: ", round(as.numeric(nogeog[i, "giniE"]), 3)
  ) 
})

Show us your final product

Defining the breaks and sticking the LEGEND on the interactive map is the hardest part of this whole thing. There is no native covariate legend available in R (yet) so there are several creative ways I’ve seen. In an interactive map, this is particularly difficult because there isn’t an easy way to insert a local image (which isn’t reproducible). The only solution I’ve found are to add as a image that is hosted from the same server, which requires an html link to the image. The other option I can think of would be include the legend as a logo which I believe also requires an html link. The other thing to consider is how the breaks or bins of data are defined to be referenced as a color on the bivariate color scale. These breaks define how the data is visualized, and so taking care to define them in a logical way is very important.

Reflection

Converting a static covariate map to an interavtive version increases the amount of information available to the user, as well as ease of use. The interactive covariate map can show exact values of data which are somewhat concealed in a static covariate map. I particuarly liked the mouseover feature in this vignette, compared to a point and click popup. This seems much more intuitive to use, though I can see it may be overwhelming on a larger map. While the toggle between income type in this particular example doesn’t seem particuarly useful, the next steps would be to use this feature for different data that would be more revealing. I still wish I understood the best ways to calculate breaks, and how to host a legend image without saving and uploading. Next steps will be to experiment with different break calculations and convert this map to a shiny application, withe the ability to plot the data in different ways.

Mapping References

Reading in NetCDF data in R and exporting as a geotiff https://chris35wills.github.io/netcdf-R/

Velox extract and summarize raster data https://philipphunziker.com/velox/extract.html#using_sf_objects

Merging Spatial Data http://www.nickeubank.com/wp-content/uploads/2015/10/RGIS2_MergingSpatialData_part1_Joins.html by Nick Eubank, building off excellent tutorials by Claudia Engel

Bivariate chloropleth map with leaflet https://davidelambert.com/post/bivarite-choropleths-in-leaflet-for-r/

Interactive Maps with Shiny https://spatialanalysis.github.io/workshop-notes/interactive-maps-with-shiny.html#interactive-tutorial-7

Interactive Choropleths with Shiny and Leaflet by Sean Angiolillo https://sean.rbind.io/project/electricity-latrine/ https://humansofdata.atlan.com/2019/01/shiny-electricity-latrine-water-india/#Working_with_Leaflet_in_Shiny

Shiny App Examples https://shiny.rstudio.com/gallery/